Abstract
Single-cell profiling allows for in-depth characterization of the tumor microenvironment and revealing the heterogeneity of pediatric cancers. Ease of access and analysis of the primary tumor single-cell data from studies on solid and blood pediatric tumors is essential for improving diagnostics and therapies. We have developed the Pediatric Single Cell Cancer Atlas (PedSCAtlas), with the goal of highlighting the heterogeneity of malignant and microenvironment cells across pediatric cancers. The atlas allows for quick exploration, comparative analysis, visualization, and prognostic assessment of genes/pathways across pediatric cancers without requiring bioinformatics analysis or computational support.
The PedSCAtlas currently contains expression data of >1.2 million single cell (sc) and single nuclei (sn) from 134 pediatric hematological and solid-organ cancer samples and healthy bone marrow (BM). The hematological cancer samples are dominated by acute leukemias containing scRNA-seq data of acute myeloid leukemia (AML), B-cell acute lymphoblastic leukemia (B-ALL), T-cell ALL (T-ALL), and mixed phenotype acute leukemia (MPAL). The scRNA-seq data are generated from the BM samples collected during the clinical diagnosis of the disease. The data for the atlas was either generated at the Bhasin lab or collected from public resources including the GEO database, ScPCA (https://scpca.alexslemonade.org/), and HCA (https://data.humancellatlas.org/). Additionally, the atlas also contains bulk RNA-sequencing data and clinical information retrieved from the TARGET Initiative (https://ocg.cancer.gov/programs/target).
The single cell/nuclei datasets are processed using a uniform workflow that includes quality control and filtering, normalization, integration with batch correction (if needed), dimensionality reduction, clustering, and cellular annotations. The data normalization and integration are performed using sctransform (Hao and Hao et al, Cell 2021) and harmony (Korsunsky et al, Nature Methods 2019) algorithms respectively. The visualization of integrated pediatric cancers single cell was achieved using Uniform Manifold Approximations and Projection (UMAP) method (Becht et al, Nature Biotech 2018). The clusters were annotated based on the expression of cell-specific markers. Differential expression analysis was used to identify potential biomarker sets for malignant cells of disease subtypes. To further ascertain that these biomarkers were malignant cell (blast) specific in the acute leukemia subtypes, the genes were filtered to exclude those with consistent expression in healthy BM cell types. To compare the expression profile of genes between single cells and bulk data, the raw counts of bulk RNA-seq datasets were filtered and normalized using the Voom algorithm (Ritchie et al, Nucleic Acids Research 2015). The resource has modules for biomarker analysis, bulk RNA deconvolution, and survival analysis to facilitate easy analysis of single-cell data. Survival analysis is available for the bulkRNA-seq dataset; the survival is calculated using the overall survival values and gene expression/enrichment input by the user. The SingleCell module allows users to visualize the pediatric cancer dataset on a UMAP by selecting the aspect (i.e., cell type) to group the dataset by; it also allows users to enter a gene name or select a pathway of their interest and view its expression or enrichment in a feature or violin plot. The ImmuneCell module allows users to interact with the adult healthy BM dataset by inputting a gene and viewing its expression on a feature plot. The Biomarkers module contains pre-defined biomarker sets for each disease subtype in the selected cancer. The final module, BulkExpression, contains the bulk RNA-seq data for the selected cancer type and allows users to view gene expression or analyze the survival associations of an input gene or gene set. The web resource source code is written in R programming language and the interactive web server has been implemented using the R Shiny and ShinyDashboard packages.
The PedScAtlas resource provides a unique and straightforward tool for biomarker identification, analysis of pediatric cancer subtype heterogeneity, and transcriptome profile of the tumor microenvironment. The resource is available online at https://bhasinlab.bmi.emory.edu/PediatricSC/.
Disclosures
Bhasin:Anxomics LLC: Current Employment, Current equity holder in private company. DeRyckere:Meryx: Other: Equity Ownership. Graham:Meryx: Membership on an entity's Board of Directors or advisory committees, Other: Equity Ownership. Bhasin:Canomiks INC: Other: Equity Ownership.
Author notes
Asterisk with author names denotes non-ASH members.